Skip to content

fix(bedrock): wrap user text/image in guardContent to prevent tool result false positives#1886

Open
giulio-leone wants to merge 1 commit intostrands-agents:mainfrom
giulio-leone:fix/guardrail-tool-result-false-positive
Open

fix(bedrock): wrap user text/image in guardContent to prevent tool result false positives#1886
giulio-leone wants to merge 1 commit intostrands-agents:mainfrom
giulio-leone:fix/guardrail-tool-result-false-positive

Conversation

@giulio-leone
Copy link
Contributor

Summary

Fixes #1671

When Bedrock guardrails are enabled, tool results stored with role: "user" can trigger false-positive prompt injection detections. For example, a tool returning "You are Test Admin User." gets flagged as a prompt injection attack on subsequent messages.

Root Cause

Without guardrail_latest_message=True, no content blocks are wrapped in guardContent. The Bedrock guardrail then evaluates all message content — including tool results that happen to have role: "user" — leading to false positives on system-generated content.

Even with the existing _find_last_user_text_message_index fix (PR #1658), this only activates when guardrail_latest_message=True. The default behavior (False) leaves tool results exposed to guardrail scanning.

Fix

When guardrails are enabled (guardrail_id + guardrail_version are set), all user text and image content blocks are wrapped in guardContent. This signals the guardrail to evaluate only those blocks, excluding tool results (which contain toolResult blocks, not text/image) from scanning.

Behavior

guardrail_latest_message Before After
True Only latest user text wrapped Unchanged
False (default) No wrapping — guardrail scans everything All user text/image wrapped — tool results excluded

Tests

  • Added test_format_request_guardrail_default_wraps_all_user_text — verifies all user text is wrapped when guardrails enabled
  • Added test_format_request_guardrail_default_excludes_tool_results — reproduces the exact scenario from [BUG] Bedrock Guardrail False Positive on Tool Results #1671
  • Added test_format_request_no_guardrail_no_wrapping — verifies no wrapping without guardrails
  • Updated 2 existing config tests to reflect new wrapping behavior
  • All 128 bedrock tests pass

@giulio-leone
Copy link
Contributor Author

Friendly ping — wraps user text/image content in guardContent format for Bedrock, preventing guardrail checks from treating tool results as user input.

…sult false positives

When guardrails are enabled, tool results (role='user') containing text like
'You are Test Admin User.' can trigger false-positive prompt injection
detections because the guardrail treats them as user input.

The fix wraps ALL user text/image content blocks in guardContent when
guardrails are enabled (not just when guardrail_latest_message=True).
This signals the guardrail to evaluate ONLY those blocks, excluding
tool results from scanning.

Behavior change:
- guardrail_latest_message=True: unchanged (only latest user text wrapped)
- guardrail_latest_message=False (default): all user text/image wrapped,
  tool results excluded from guardrail scanning

Closes strands-agents#1671
@giulio-leone
Copy link
Contributor Author

Refreshed onto main @ fd8168a (v1.32.0+2) — 2026-03-23

Root cause confirmed still live: The guardrail wrapping logic only applies when guardrail_latest_message=True (opt-in). Without it, tool results (which also carry role="user") are sent to Bedrock guardrails unwrapped — causing false-positive prompt injection detections on system-generated content (see #1671).

Fix: Introduce has_guardrail pre-check (guardrail_id + guardrail_version both present). When guardrail_latest_message=False (the default), wrap all user text/image blocks in guardContent — but never wrap toolResult blocks. When guardrail_latest_message=True, preserve the existing "only latest user message" behavior.

Runtime proof on rebased branch ef07544:

role=user  block=text        guardrail_wrapped=True   ← user message correctly wrapped
role=user  block=toolResult  guardrail_wrapped=False  ← tool result correctly excluded

Test results:

  • 25 targeted guardrail/guard-content tests: 25/25 PASSED
  • Full Bedrock model test suite: 126/126 PASSED

@giulio-leone giulio-leone force-pushed the fix/guardrail-tool-result-false-positive branch from 6114956 to ef07544 Compare March 23, 2026 06:10
@github-actions github-actions bot added size/m and removed size/m labels Mar 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] Bedrock Guardrail False Positive on Tool Results

1 participant